feat: surface gate detail in the workflow run/resume --json payload#2965
feat: surface gate detail in the workflow run/resume --json payload#2965doquanghuy wants to merge 9 commits into
Conversation
|
@mnriem when you have a moment, would appreciate your thoughts on the direction here — the issue lists the alternatives considered, and I'm happy to rework toward whichever shape fits Spec Kit best. |
There was a problem hiding this comment.
Pull request overview
This PR extends the workflow CLI’s --json run/resume outcome payload to include structured details when the run is paused at a gate step, enabling external orchestrators to detect “human review needed” without parsing stdout.
Changes:
- Record each executed step’s
typeinto persistedstep_resultsso step types are recoverable from run state. - Add an optional
gateblock to theworkflow run --json/workflow resume --jsonpayload when the current step is a gate. - Add CLI-level tests covering a non-interactive gate pause (includes
gateblock) and a non-gate completed run (nogatekey).
Show a summary per file
| File | Description |
|---|---|
tests/test_workflows.py |
Adds CLI-level tests asserting --json includes a structured gate block on gate pauses and omits it for a normal completed run. |
src/specify_cli/workflows/engine.py |
Persists type in each step’s recorded step_results entry so step-type introspection is possible from run state. |
src/specify_cli/__init__.py |
Builds the --json outcome payload and conditionally injects gate details via a helper when the current step is a gate. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
|
Please address Copilot feedback |
Address review (github#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@mnriem Thanks for the review — addressed the Copilot feedback:
Full suite green; |
|
@mnriem Pushed
|
|
Please address Copilot feedback and rebase on upstream/main |
|
@mnriem Pushed
Full |
|
@mnriem Pushed
Full Heads up: GitHub now shows this branch as conflicting with |
A paused run was indistinguishable from any other pause in the machine-readable outcome, and the gate's prompt/options/choice never left the human-facing stream. Record each step's type in the run state's step results (one engine line) and, when the run sits at a gate, add a gate block (step_id/message/options/choice) to the payload so orchestrators can drive review gates without parsing stdout. Reference implementation for the proposal in github#2964. Addresses github#2964
Address review (github#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Address Copilot review: - `_gate_outcome` now also surfaces the gate block when a run is `aborted` by a gate rejection (`on_reject: abort`), not only when `paused`. Abort is the only path that sets ABORTED and it leaves current_step_id on the gate, so an orchestrator can read the recorded `choice` for the stop. - Coerce `message` to a string (it may be a non-string YAML literal that GateStep only coerces for interpolation) so the JSON schema stays stable. - Tests: add a CLI-level aborted-path test, a message-coercion test, and extend the suppression test to allow `aborted`; share the run helper via `_invoke_json` to avoid duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address Copilot review: the gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would surface as an opaque JSON decode error. Route it through `_run_json` (which asserts exit_code == 0 before parsing) and drop the now-redundant `_invoke_json` helper — a gate abort emits the payload and returns, so the run exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address Copilot review: - `_run_json` asserted with `result.stdout` in the message, but under `--json` step output is redirected off stdout — the useful diagnostics live on `result.output`. Switch the assertion message to `result.output` (the JSON parse still reads stdout), matching the other CLI tests. - `StepContext.steps` documented a 5-key entry shape; the engine now also persists `type` and `status`. Update the docstring to the canonical 7-key shape so step authors/debuggers see the real record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After rebasing onto main, a gate abort now emits the --json payload and then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the merged exit-code work). Give `_run_json` an `expected_exit` parameter (default 0) so the abort case asserts exit 1 while the paused/completed cases stay at 0 — keeping a single shared helper rather than duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3e303fb to
24d6e85
Compare
|
@mnriem Rebased onto current What the rebase touched:
All prior Copilot fixes are intact (paused/aborted guard, message coercion, |
Address Copilot review: - A run paused by an older version has no persisted step `type`, so `_gate_outcome` would never surface its gate block on resume. Add `_is_gate_step`: prefer the `type` field, but when it is absent fall back to the gate's unique output signature (`on_reject`, written only by GateStep). A record with a different known `type` is still not a gate. - Normalize `options` to a list of strings (mirroring the `message` coercion) so an unvalidated workflow with non-string options can't destabilize the JSON schema. - Tests: options coercion, type-less gate detection, and a type-less non-gate negative case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@mnriem Pushed
Full |
Address Copilot review: the prior options normalization only mapped a `list`, returning the raw value for any other shape (scalar/tuple), which contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`: None stays None; list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is one option, never iterated character-by-character). The emitted schema is now always list[str] | None. Extend the options test to cover list, tuple, bare string, numeric scalar, and None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@mnriem Pushed
Full |
| output = step.get("output") or {} | ||
| # `message` and `options` may be non-string YAML literals in an unvalidated | ||
| # workflow (GateStep coerces neither for the payload), so normalise both | ||
| # here for a stable JSON schema: message → str, options → list[str] | None. | ||
| message = output.get("message") | ||
| return { | ||
| "step_id": state.current_step_id, | ||
| "message": None if message is None else str(message), | ||
| "options": _normalize_gate_options(output.get("options")), | ||
| "choice": output.get("choice"), | ||
| } |
| steps: | ||
| - id: fine | ||
| type: shell | ||
| run: "true" |
Address Copilot review: - `_gate_outcome` normalized `message` and `options` but passed `choice` through as-is; an unvalidated gate can record a non-string `choice`, which contradicts the stable-schema rationale. Coerce `choice` to `str | None` (None still means "no decision yet"), consistent with the other two fields. Adds a focused choice-coercion test. - The plain (no-gate) test workflow used `run: "true"`, which fails under cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform `run: "exit 0"` (matching the exit-code suite's workflows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@mnriem Pushed
Full |
Description
Reference implementation for #2964 — for discussion, direction welcome.
When a run pauses at a gate, the
--jsonoutcome now carries agateblock (step_id/message/options/choice) so orchestrators can detect "human review needed" and present the options without parsing the human-facing stream. Two small pieces:typein the run state's step results (one added line instep_data— previously the type was not recoverable from state)._workflow_run_payloadadds thegateblock via a_gate_outcomehelper when the run's current step is a gate.choicepopulates when the outcome ends at the gate with a decision recorded (e.g. an interactive rejection withon_reject: abort→ afailedpayload carrying"choice": "reject"; anon_reject: retrypause likewise). A mid-flow approval proceeds past the gate, so the block clears — by design. Non-gate runs and runs that end elsewhere are unchanged — nogatekey, payload byte-identical to today.The issue lists alternatives (a generic
paused_stepblock; a dedicated status value) — happy to rework toward either.Testing
uv sync && uv run pytest— full suite 3727 passedTestWorkflowRunGateOutcomeJson): a gate pause carries the exact block (CliRunner stdin is non-TTY, so the gate pauses); a completed run has nogatekey — the gate-pause test is red against currentmain, green with the change (verified both directions)uvx ruff check src/— cleanuv run specify --helpworkflow run --json)AI Disclosure
Code, tests, and this description were authored with AI assistance (Claude); verified by running the repo's test suite and ruff locally in both red and green directions.